WELCOME!!¶
Welcome to my Time Series project. The goal of this project was to explore EDA techniques and Statistical tests that are specific to Time Series Data, and then use the knowledge gained from them to calculate granular forecasting values for each week of all departments in every Walmart store. This would allow us to filter with desired group of stores and get a forecast immediately! As you can tell by looking at the Tableau dashboard on your left, it worked! And I thoroughly enjoyed the process! I achieved a MAPE of 2.39 on overall predictions!
To make your navigation easy, you can use the table of contents page on your left to quickly navigate to your desired section!
Data description:¶
I had multiple different csv files that I converted to pandas dataframe. I then combined these different pandas dataframe into one single dataframe for ease of data analysis. Here is some information about all the csv files.
stores.csv: This file contains anonymized information about the 45 stores, indicating the type (A, B or C) and size of store.
train.csv: This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file you will find the following fields:
Store - the store number
Dept - the department number
Date - the week
Weekly_Sales - sales for the given department in the given store
IsHoliday - whether the week is a special holiday week
test.csv: This file is identical to train.csv, except we have withheld the weekly sales. You must predict the sales for each triplet of store, department, and date in this file.
features.csv: This file contains additional data related to the store, department, and regional activity for the given dates. It contains the following fields:
Store - the store number
Date - the week
Temperature - average temperature in the region
Fuel_Price - cost of fuel in the region
MarkDown1-5 - anonymized data related to promotional markdowns that Walmart is running. MarkDown data is only available after Nov 2011, and is not available for all stores all the time. Any missing value is marked with an NA.
CPI - the consumer price index
Unemployment - the unemployment rate
IsHoliday - whether the week is a special holiday week
For convenience, the four holidays fall within the following weeks in the dataset (not all holidays are in the data):
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
Labor Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13
Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13
Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
import pandas as pd
features = pd.read_csv('/content/drive/MyDrive/Forecasting Project/archive/features.csv')
stores = pd.read_csv('/content/drive/MyDrive/Forecasting Project/archive/stores.csv')
test = pd.read_csv('/content/drive/MyDrive/Forecasting Project/archive/test.csv')
train = pd.read_csv('/content/drive/MyDrive/Forecasting Project/archive/train.csv')
Phase 1 - Data Preparation¶
Here, I check for data integrity and then create a new feature called "WeeksToHoliday" that counts the number of weeks to the closest holiday. Finally, I create one big pandas dataframe called 'training_df' that will act as a "master dataset". This will make data-filtering by Store type and Department Category easier.
features.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8190 entries, 0 to 8189 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Store 8190 non-null int64 1 Date 8190 non-null object 2 Temperature 8190 non-null float64 3 Fuel_Price 8190 non-null float64 4 MarkDown1 4032 non-null float64 5 MarkDown2 2921 non-null float64 6 MarkDown3 3613 non-null float64 7 MarkDown4 3464 non-null float64 8 MarkDown5 4050 non-null float64 9 CPI 7605 non-null float64 10 Unemployment 7605 non-null float64 11 IsHoliday 8190 non-null bool dtypes: bool(1), float64(9), int64(1), object(1) memory usage: 712.0+ KB
stores.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 45 entries, 0 to 44 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Store 45 non-null int64 1 Type 45 non-null object 2 Size 45 non-null int64 dtypes: int64(2), object(1) memory usage: 1.2+ KB
train.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 421570 entries, 0 to 421569 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Store 421570 non-null int64 1 Dept 421570 non-null int64 2 Date 421570 non-null object 3 Weekly_Sales 421570 non-null float64 4 IsHoliday 421570 non-null bool dtypes: bool(1), float64(1), int64(2), object(1) memory usage: 13.3+ MB
test.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 115064 entries, 0 to 115063 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Store 115064 non-null int64 1 Dept 115064 non-null int64 2 Date 115064 non-null object 3 IsHoliday 115064 non-null bool dtypes: bool(1), int64(2), object(1) memory usage: 2.7+ MB
Seems like features.csv is the only file with missing values. Taking a quick look shows us the following.
features.isna().sum()
| 0 | |
|---|---|
| Store | 0 |
| Date | 0 |
| Temperature | 0 |
| Fuel_Price | 0 |
| MarkDown1 | 4158 |
| MarkDown2 | 5269 |
| MarkDown3 | 4577 |
| MarkDown4 | 4726 |
| MarkDown5 | 4140 |
| CPI | 585 |
| Unemployment | 585 |
| IsHoliday | 0 |
By observing our data, it can be noted that the missing values for CPI and unemployment fall in the test-set range of dates. Hence, we can ignore them for now. Markdown is missing before November 2011 and isn't consistent later either.
Step 1: Checking Data Integrity¶
Additonally, I did some checks to make sure that all of our dataframes allign with each other. I checked whether the stores included within all of our different dataframes are exhaustive. I then checked whether the weeks mentioned for our training data match up. Lastly, I also checked for duplicates within our dataframes.
# Check if all datasets have the same stores
train_stores = set(train['Store'].unique())
test_stores = set(test['Store'].unique())
features_stores = set(features['Store'].unique())
stores_stores = set(stores['Store'].unique())
print("Stores in all datasets match:", train_stores == test_stores == features_stores == stores_stores)
# Check if features covers all the weeks from train
train_weeks = set(train['Date'].unique())
features_weeks = set(features['Date'].unique())
print("All train weeks included in features data:", train_weeks <= features_weeks)
# Check for duplicate entries
print("Duplicate rows in training features:", features.duplicated().sum())
print("Duplicate rows in training data:", train.duplicated().sum())
print("Duplicate rows in testing data:", test.duplicated().sum())
print("Duplicate rows in stores data:", stores.duplicated().sum())
# Filling NA with 0 in features - currently not necessary since all values for training features present.
# features = features.fillna(0)
Stores in all datasets match: True All train weeks included in features data: True Duplicate rows in training features: 0 Duplicate rows in training data: 0 Duplicate rows in testing data: 0 Duplicate rows in stores data: 0
Our data is already in excellent condition! Now time for some feature engineering!
Step 2: Feature Engineering¶
The IsHoliday feature in our features.csv file is a boolean variable, which might not be useful if holidays are few in number. Hence, I want to create a more descriptive variable - a variable that counts the number of weeks to the next holiday!
features.loc[features['IsHoliday'] == True, 'Date'].unique()
array(['2010-02-12', '2010-09-10', '2010-11-26', '2010-12-31',
'2011-02-11', '2011-09-09', '2011-11-25', '2011-12-30',
'2012-02-10', '2012-09-07', '2012-11-23', '2012-12-28',
'2013-02-08'], dtype=object)
The only holidays highlighted in our Data are President's day/Superbowl, Thanksgiving, Christmas/New years and Labor Day.
Now I will create a new feature in feature.csv that counts the number of days since/till the closest holiday. Positive values indicate that the nearest holiday is coming up, negative indicate that the nearest holiday has recently gone by.
# Get indices of holiday weeks
holiday_indices = features.index[features['IsHoliday']].to_list()
# Compute weeks to the nearest holiday
features['WeeksToHoliday'] = features.index.to_series().apply(lambda i: min(holiday_indices, key=lambda h: abs(h - i)) - i)
print(features['WeeksToHoliday'].head(20))
0 1 1 0 2 -1 3 -2 4 -3 5 -4 6 -5 7 -6 8 -7 9 -8 10 -9 11 -10 12 -11 13 -12 14 -13 15 -14 16 -15 17 14 18 13 19 12 Name: WeeksToHoliday, dtype: int64
For the sake of upcoming EDA and Forecasting algorithms, I will be creating lag-variables and rolling averages. Both of them will be for a 4-week one and a 12-week one. Had to make sure that we calculate these values for each department within each store seperately.
# # Lag-4: Sales from 4 weeks ago
# train['Lag-4'] = train.groupby(['Store', 'Dept'])['Weekly_Sales'].shift(4)
# # print(train.iloc[10243])
# # Lag-12: Sales from 12 weeks ago
# train['Lag-12'] = train.groupby(['Store', 'Dept'])['Weekly_Sales'].shift(12)
# # Rolling average - 4 weeks
# train['Rolling_Avg-4'] = train.groupby(['Store', 'Dept'])['Weekly_Sales'].rolling(window=4, min_periods=1).mean().reset_index(level=[0, 1], drop=True)
# # Rolling average - 12 weeks
# train['Rolling_Avg-12'] = train.groupby(['Store', 'Dept'])['Weekly_Sales'].rolling(window=12, min_periods=1).mean().reset_index(level=[0, 1], drop=True)
# # Printing to make sure that the values are keeping track of the NaN values
# print(train.iloc[10242:10245])
# print(train.iloc[141:146])
NOTE TO SELF - WHEN USING THE ROLLING AVERAGE COLUMNS IN THE FUTURE FOR STORE WISE VALUES, PLEASE REMEMBER TO AVERAGE THE RA FROM ALL DIFFERENT DEPARTMENT.
Next, I will merge the three main files into one big file for future use in Tableau!
# Merge Train and Features on 'Store' and 'Date' columns, dropping IsHoliday once because it repeats in the two dfs.
training_df = pd.merge(train.drop('IsHoliday', axis=1), features, on=['Store', 'Date'], how='left')
# Merge the resulting combined dataframe with Store Data (on 'Store' column)
training_df = pd.merge(training_df, stores, on='Store', how='left')
print(training_df.isna().sum())
# print(training_df.head(35))
Store 0 Dept 0 Date 0 Weekly_Sales 0 Temperature 0 Fuel_Price 0 MarkDown1 270889 MarkDown2 310322 MarkDown3 284479 MarkDown4 286603 MarkDown5 270138 CPI 0 Unemployment 0 IsHoliday 0 WeeksToHoliday 0 Type 0 Size 0 dtype: int64
Exporting csv file to use in Tableau/SQL
training_df.to_csv('Prepared_data.csv')
Phase 2 - EDA¶
This section has 4 major parts. First, I identify KPIs and general insights about the overall data (for eg. Total Sales, top 10 and bottom 10 stores and department, sales and store size for each type of stores).
In Steps 2 and 3, I try understanding how different types of stores correspond to different store sizes and weekly sales, and how they differ in terms of sales volume, trend and seasonality.
In the final step, I analyse the departments across all the stores and group them into different categories based on their sales volume and volatility.
Step 1 - KPIs and General EDA¶
# Ensure the 'Date' column is in datetime format
training_df['Date'] = pd.to_datetime(training_df['Date'])
# Basic KPIs
print(f"Total Sales: {training_df['Weekly_Sales'].sum():,.2f}")
print('')
print(f"Date Range: {training_df['Date'].min()} to {training_df['Date'].max()}")
print('')
print(f"Number of Stores: {training_df['Store'].nunique()}")
print('')
print(f"Average Store Size: {training_df['Size'].mean():,.2f}")
print('')
# Sales by Store
store_sales = training_df.groupby('Store')['Weekly_Sales'].sum().reset_index()
print(f"Top 10 Performing Stores:\n{store_sales.sort_values(by='Weekly_Sales', ascending=False).head(10)}")
print('')
print(f"Bottom 10 Stores:\n{store_sales.sort_values(by='Weekly_Sales', ascending=False).tail(10)}")
print('')
# Sales by Week
weekly_sales = training_df.groupby('Date')['Weekly_Sales'].sum().reset_index()
print(f"Top 10 Best Weeks:\n{weekly_sales.sort_values(by='Weekly_Sales', ascending=False).head(10)}")
print('')
print(f"Bottom 10 Weeks:\n{weekly_sales.sort_values(by='Weekly_Sales', ascending=False).tail(10)}")
print('')
# Sales by Department
dep_sales = training_df.groupby('Dept')['Weekly_Sales'].sum().reset_index()
print(f"Top 10 Best Departments:\n{dep_sales.sort_values(by='Weekly_Sales', ascending=False).head(10)}")
print('')
print(f"Bottom 10 Departments:\n{dep_sales.sort_values(by='Weekly_Sales', ascending=False).tail(10)}")
print('')
# Sales by Store Type
type_sales = training_df.groupby('Type')['Weekly_Sales'].sum().reset_index()
print(f"Sales by Store Type:\n{type_sales}")
print('')
# Average Weekly Sales by Type
avg_weekly_sales_by_type = training_df.groupby('Type')['Weekly_Sales'].mean().reset_index()
print(f"Average Weekly Sales by Store Type:\n{avg_weekly_sales_by_type}")
print('')
# Store Size by Type
store_size_by_type = training_df.groupby('Type')['Size'].mean().reset_index()
print(f"Average Store Size by Store Type:\n{store_size_by_type}")
print('')
Total Sales: 6,737,218,987.11
Date Range: 2010-02-05 00:00:00 to 2012-10-26 00:00:00
Number of Stores: 45
Average Store Size: 136,727.92
Top 10 Performing Stores:
Store Weekly_Sales
19 20 3.013978e+08
3 4 2.995440e+08
13 14 2.889999e+08
12 13 2.865177e+08
1 2 2.753824e+08
9 10 2.716177e+08
26 27 2.538559e+08
5 6 2.237561e+08
0 1 2.224028e+08
38 39 2.074455e+08
Bottom 10 Stores:
Store Weekly_Sales
28 29 77141554.31
15 16 74252425.40
36 37 74202740.32
29 30 62716885.12
2 3 57586735.07
37 38 55159626.42
35 36 53412214.97
4 5 45475688.90
43 44 43293087.84
32 33 37160221.96
Top 10 Best Weeks:
Date Weekly_Sales
46 2010-12-24 80931415.60
98 2011-12-23 76998241.31
94 2011-11-25 66593605.26
42 2010-11-26 65821003.24
45 2010-12-17 61820799.85
97 2011-12-16 60085695.94
44 2010-12-10 55666770.39
96 2011-12-09 55561147.70
113 2012-04-06 53502315.87
126 2012-07-06 51253021.88
Bottom 10 Weeks:
Date Weekly_Sales
34 2010-10-01 42239875.87
86 2011-09-30 42195830.81
102 2012-01-20 42080996.56
101 2012-01-13 42023078.48
33 2010-09-24 41358514.41
49 2011-01-14 40673678.04
50 2011-01-21 40654648.03
47 2010-12-31 40432519.00
103 2012-01-27 39834974.67
51 2011-01-28 39599852.99
Top 10 Best Departments:
Dept Weekly_Sales
73 92 4.839433e+08
76 95 4.493202e+08
36 38 3.931181e+08
60 72 3.057252e+08
71 90 2.910685e+08
38 40 2.889360e+08
1 2 2.806112e+08
72 91 2.167817e+08
12 13 1.973216e+08
7 8 1.942808e+08
Bottom 10 Departments:
Dept Weekly_Sales
56 60 2005020.96
51 54 516294.63
80 99 358149.85
62 77 49344.27
43 45 44937.63
49 51 30572.83
63 78 1714.71
37 39 177.98
41 43 14.32
45 47 -4962.93
Sales by Store Type:
Type Weekly_Sales
0 A 4.331015e+09
1 B 2.000701e+09
2 C 4.055035e+08
Average Weekly Sales by Store Type:
Type Weekly_Sales
0 A 20099.568043
1 B 12237.075977
2 C 9519.532538
Average Store Size by Store Type:
Type Size
0 A 182231.285486
1 B 101818.735827
2 C 40535.725286
First we look at the combined sales of all 45 walmart stores. We can see two major spikes - both around thanksgiving/Christmas and New years time.
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(18,6))
ax = sns.lineplot(data=training_df, x='Date', y='Weekly_Sales', estimator='sum')
plt.title("Total Weekly Sales Over Time")
# Adjust x-ticks to show only every alternate label
xticks = ax.get_xticks()
ax.set_xticks(xticks[::2]) # Show every alternate tick
plt.xticks(rotation=90)
plt.show()
Step 2 - Store and Size Impact¶
I first try understanding the meaning behind the 'Type' feature. We look at summary stats of Weekly Sales and store size for each type.
for store_type in training_df['Type'].unique():
print(f"Weekly Department-wise Summary statistics for Store Type {store_type}:\n")
# Filter by store type
store_df = training_df[training_df['Type'] == store_type]
# Summary stats for weekly sales and store size
summary_stats = store_df[['Weekly_Sales', 'Size']].describe()
print(summary_stats)
print("-" * 50)
Weekly Department-wise Summary statistics for Store Type A:
Weekly_Sales Size
count 215478.000000 215478.000000
mean 20099.568043 182231.285486
std 26423.457227 41534.529330
min -4988.940000 39690.000000
25% 3315.090000 158114.000000
50% 10105.170000 202505.000000
75% 26357.180000 203819.000000
max 474330.100000 219622.000000
--------------------------------------------------
Weekly Department-wise Summary statistics for Store Type B:
Weekly_Sales Size
count 163495.000000 163495.000000
mean 12237.075977 101818.735827
std 17203.668989 30921.779415
min -3924.000000 34875.000000
25% 1927.055000 93188.000000
50% 6187.870000 114533.000000
75% 15353.740000 123737.000000
max 693099.360000 140167.000000
--------------------------------------------------
Weekly Department-wise Summary statistics for Store Type C:
Weekly_Sales Size
count 42597.000000 42597.000000
mean 9519.532538 40535.725286
std 15985.351612 1194.434302
min -379.000000 39690.000000
25% 131.990000 39690.000000
50% 1149.670000 39910.000000
75% 12695.010000 41062.000000
max 112152.350000 42988.000000
--------------------------------------------------
It is evident that stores are divided into types based on their average weekly sales and size. However, the max sales for some store in type B is higher than all stores in type A, highlighting an interesting store.
Now, I take a look at the overall annual sales of each of our stores and rank them. Then I plot them in a bar graph and color them based on their types. The bar graphs helps in visually identifying any outliers we might have.
# Aggregate total weekly sales per store
store_sales = training_df.groupby("Store")["Weekly_Sales"].sum().reset_index()
store_sales["Store_Rank"] = store_sales["Weekly_Sales"].rank(ascending=False, method="dense").astype(int)
# Add store type information (assuming 'Type' column exists in training_df)
store_types = training_df[["Store", "Type"]].drop_duplicates()
store_sales = store_sales.merge(store_types, on="Store", how="left") # Merge with rankings
store_sales = store_sales.sort_values(by = "Store_Rank", ascending = True)
# Define color mapping for store types
color_mapping = {'A': 'blue', 'B': 'orange', 'C': 'green'}
# Create bar plot with colors based on store type
plt.figure(figsize=(12,6))
bars = sns.barplot(data=store_sales, x='Store_Rank', y='Weekly_Sales', hue='Type', palette=color_mapping)
# Create a list of labels for each container
labels = [store_sales["Store"][store_sales["Type"] == store_type].tolist() for store_type in store_sales["Type"].unique()]
# Label each container with the corresponding labels
for container, label in zip(bars.containers, labels):
bars.bar_label(container, labels=label)
plt.title("Total Sales per Store Colored by Store Type (Store ID on Bars)")
plt.xlabel("Store Rank")
plt.ylabel("Total Sales")
plt.legend(title="Store Type")
plt.show()
Immediately, we see stores 10 and 23 stand out from type B, and stores 36 and 33 stand out from type A. Store 10 and 23, especially store 10 has sales comparable to type A. Stores 36 and 33 are doing worse than most type C stores.
While we do have a better idea of stores now, I would still like to quantify these observations. Are Stores 10 and 23 significantly bigger than stores in type B? Are stores 36 and 33 smaller than all stores in type A? Does that explain the anomalies? I answer these question next
aggregated_df = training_df.groupby(['Store', 'Type'], as_index=False).agg({
'Weekly_Sales': 'sum',
'Size': 'first' # Assuming store size remains constant for each store
})
for store_type in aggregated_df['Type'].unique():
print(f"Summary statistics for Store Type {store_type}:\n")
# Filter by store type
store_df = aggregated_df[aggregated_df['Type'] == store_type]
# Summary stats for aggregated weekly sales and store size
summary_stats = store_df[['Weekly_Sales', 'Size']].describe()
# Format numbers for better readability
formatted_stats = summary_stats.map(lambda x: f"{x:,.2f}")
print(formatted_stats)
print("-" * 50)
Summary statistics for Store Type A:
Weekly_Sales Size
count 22.00 22.00
mean 196,864,305.58 177,247.73
std 72,556,192.97 49,392.62
min 37,160,221.96 39,690.00
25% 149,267,106.88 155,840.75
50% 196,814,963.39 202,406.00
75% 246,330,970.32 203,819.00
max 301,397,792.46 219,622.00
--------------------------------------------------
Summary statistics for Store Type B:
Weekly_Sales Size
count 17.00 17.00
mean 117,688,278.64 101,190.71
std 55,898,276.21 32,371.14
min 45,475,688.90 34,875.00
25% 77,789,218.99 93,188.00
50% 108,117,878.92 114,533.00
75% 144,287,230.15 123,737.00
max 271,617,713.89 140,167.00
--------------------------------------------------
Summary statistics for Store Type C:
Weekly_Sales Size
count 6.00 6.00
mean 67,583,921.26 40,541.67
std 17,225,671.35 1,304.15
min 43,293,087.84 39,690.00
25% 57,048,941.09 39,745.00
50% 68,459,812.72 39,910.00
75% 78,224,999.40 40,774.00
max 90,565,435.41 42,988.00
--------------------------------------------------
print(training_df[['Store', 'Dept', 'Date', 'Weekly_Sales', 'Size']][training_df['Weekly_Sales'] > 693090])
print(aggregated_df[aggregated_df['Store'] == 10])
print(aggregated_df[aggregated_df['Store'] == 23])
print(aggregated_df[aggregated_df['Store'] == 36])
print(aggregated_df[aggregated_df['Store'] == 33])
Store Dept Date Weekly_Sales Size
95373 10 72 2010-11-26 693099.36 126512
Store Type Weekly_Sales Size
9 10 B 2.716177e+08 126512
Store Type Weekly_Sales Size
22 23 B 1.987506e+08 114533
Store Type Weekly_Sales Size
35 36 A 53412214.97 39910
Store Type Weekly_Sales Size
32 33 A 37160221.96 39690
LOOK AT TABLEAU FOR A CHART SIMILAR TO THE ABOVE ONE, BUT FOR STORE SIZE.
In conclusion, we notice that stores 10 and 23 two of the largest stores in type B - however, they are barely big enough to be considered as type A. They don't even make the first quartile range for size in type A.
On the other hand, stores 36 and 33 are so small in both size and sales, that they should be classified as Type C. It would be interesting to look at sales trend for these two stores over the two years.
Step 3 - Seasonality and Trends¶
I wanted to look at the most popular weeks across our different store types.
# List of store types
store_types = ['A', 'B', 'C']
# Loop through each store type and find the top 10 weeks with the most sales
for store_type in store_types:
# Filter for the current store type and aggregate weekly sales by date
weekly_sales_overall = training_df[training_df['Type'] == store_type].groupby('Date')['Weekly_Sales'].sum().reset_index()
# Get the top 10 weeks with the most sales
top_10_weeks_overall = weekly_sales_overall.sort_values(by='Weekly_Sales', ascending=False).head(10)
# Display the results
print(f"Top 10 Weeks with the Most Sales (for Type {store_type} stores):")
print(top_10_weeks_overall)
print("-" * 50) # Separator for better readability
Top 10 Weeks with the Most Sales (for Type A stores):
Date Weekly_Sales
46 2010-12-24 51499373.00
98 2011-12-23 49252109.29
94 2011-11-25 41922781.18
42 2010-11-26 41448457.21
45 2010-12-17 39179423.01
97 2011-12-16 38249734.51
96 2011-12-09 35683057.82
44 2010-12-10 35567817.25
113 2012-04-06 34748847.73
126 2012-07-06 33219302.49
--------------------------------------------------
Top 10 Weeks with the Most Sales (for Type B stores):
Date Weekly_Sales
46 2010-12-24 26389959.94
98 2011-12-23 24600399.99
94 2011-11-25 21745597.80
42 2010-11-26 21567271.42
45 2010-12-17 19895454.43
97 2011-12-16 18881917.62
44 2010-12-10 17341040.94
96 2011-12-09 16927390.24
43 2010-12-03 15743469.37
113 2012-04-06 15597038.30
--------------------------------------------------
Top 10 Weeks with the Most Sales (for Type C stores):
Date Weekly_Sales
113 2012-04-06 3156429.84
98 2011-12-23 3145732.03
100 2012-01-06 3126550.94
105 2012-02-10 3110690.90
135 2012-09-07 3070635.59
101 2012-01-13 3056745.85
117 2012-05-04 3054837.24
126 2012-07-06 3053920.22
46 2010-12-24 3042082.66
122 2012-06-08 3040216.34
--------------------------------------------------
Interestingly, the top 10 weeks for Types A and B are nearly identical - they are both concentrated towards November and December (Thanksgiving and Christmas/New Years). However, the top 10 for Type C are spread out in a lot of different months. It would be interesting to look at the sales trend like earlier, but based on types.
weekly_sales_overall = training_df.groupby('Date')['Weekly_Sales'].sum().reset_index()
for store_type in store_types:
# Filter for the current store type and aggregate weekly sales by date
weekly_sales_overall = training_df[training_df['Type'] == store_type].groupby('Date')['Weekly_Sales'].sum().reset_index()
plt.figure(figsize=(12,4))
ax = sns.lineplot(data=weekly_sales_overall, x='Date', y='Weekly_Sales')
plt.title(f"Total Weekly Sales for {store_type} Over Time")
# Adjust x-ticks to show only every alternate label
xticks = ax.get_xticks()
ax.set_xticks(xticks[::2]) # Show every alternate tick
plt.xticks(rotation=90)
plt.show()
As expected, the trend line for Types A and B are identical. However, the trend lines for Type C is completely unique. The trend like almost looks stationary, and it seems like the holidays don't affect the sales much.
For more accurate forecasting, I will be training my forecasting algorithm on Type C seperately.
Next, I would like to know whether stores 3, 5, 36 and 33 follow trend lines for Type C, and whether stores 43 and 42 follow trend lines for Type A/B.
# Create a combined plot for all store types
plt.figure(figsize=(18, 6))
store_list_1 = [3, 5, 36, 33, 43, 42]
for store in store_list_1:
# Filter for the current store type and aggregate weekly sales by date
weekly_sales_overall = training_df[training_df['Store'] == store].groupby('Date')['Weekly_Sales'].sum().reset_index()
# Add a line for each store type
ax = sns.lineplot(data=weekly_sales_overall, x='Date', y='Weekly_Sales', label=f"Store No. {store}")
# Set title and axis labels
plt.title("Total Weekly Sales Over Time for underperforming stores")
plt.xlabel("Date")
plt.ylabel("Weekly Sales")
# Adjust x-ticks to show every alternate label
xticks = ax.get_xticks()
ax.set_xticks(xticks[::2]) # Show every alternate tick
plt.xticks(rotation=90)
# Display the legend to differentiate store types
plt.legend(title="Underperforming stores")
# Show the final combined plot
plt.show()
Yet again, we get some interesting results. Turns out that stores 3 and 5, even with their Type C - level sales performance show a significant sales pattern consistent with Types A/B. However, stores 36 and 33, which are stores of Type A, that we earlier saw have the sales performance and size of a Type C, also BEHAVE like a Type C store. Hence, I would strongly consider re-labeling Stores 36 and 33.
Finally, stores 43 and 42, despite having high sales performance, don't show the dramatic seasonality of Type A/B.
So far, we have successfully identified seasonality. It's now time to identify trends within our data, to sea whether the stores are improving or doing worse
from statsmodels.tsa.seasonal import STL
weekly_sales_overall = training_df.groupby('Date')['Weekly_Sales'].sum().reset_index()
# Set 'Date' as the index for STL decomposition
weekly_sales_overall.set_index('Date', inplace=True)
# Perform STL decomposition
stl = STL(weekly_sales_overall['Weekly_Sales'], period=52)
result = stl.fit()
# Plot the original data and the trend
plt.figure(figsize=(6, 6))
plt.plot(weekly_sales_overall.index, weekly_sales_overall['Weekly_Sales'], label='Original Sales', alpha=0.6)
plt.plot(weekly_sales_overall.index, result.trend, label='Trend', color='red')
plt.title('STL Decomposition: Weekly Sales and Trend')
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.legend()
plt.xticks(rotation=90)
plt.show()
# List of store types
store_types = ['A', 'B', 'C']
# Loop through each store type
for store_type in store_types:
print(f"Processing Store Type {store_type}...")
# Filter data for the current store type
store_type_df = training_df[training_df['Type'] == store_type]
# Aggregate weekly sales by date
weekly_sales_by_type = store_type_df.groupby('Date')['Weekly_Sales'].sum().reset_index()
# Ensure data is sorted by date
weekly_sales_by_type = weekly_sales_by_type.sort_values('Date')
# Set the index for STL decomposition
weekly_sales_by_type.set_index('Date', inplace=True)
# Perform STL decomposition (assuming weekly data with 52 periods in a year)
stl = STL(weekly_sales_by_type['Weekly_Sales'], period=52)
result = stl.fit()
# Plot the original data and the trend for this store type
plt.figure(figsize=(6, 6))
plt.plot(weekly_sales_by_type.index, weekly_sales_by_type['Weekly_Sales'], label='Original Sales', alpha=0.6)
plt.plot(weekly_sales_by_type.index, result.trend, label='Trend', color='red')
plt.title(f'STL Decomposition: Weekly Sales and Trend for Store Type {store_type}')
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.legend()
plt.xticks(rotation=90)
plt.show()
Processing Store Type A...
Processing Store Type B...
Processing Store Type C...
Step 4 - Department-wise performances¶
It was clear early on that some departments are missing. Here are the ones that are:
import numpy as np
# missing departments
set(np.arange(1, 100)) - set(training_df["Dept"].unique())
{np.int64(15),
np.int64(53),
np.int64(57),
np.int64(61),
np.int64(62),
np.int64(63),
np.int64(64),
np.int64(66),
np.int64(68),
np.int64(69),
np.int64(70),
np.int64(73),
np.int64(75),
np.int64(76),
np.int64(84),
np.int64(86),
np.int64(88),
np.int64(89)}
training_df[["Date", "Store", "Dept", "Weekly_Sales", "Size", "Type"]].sort_values(by = "Weekly_Sales", ascending = False).head(10)
| Date | Store | Dept | Weekly_Sales | Size | Type | |
|---|---|---|---|---|---|---|
| 95373 | 2010-11-26 | 10 | 72 | 693099.36 | 126512 | B |
| 338013 | 2011-11-25 | 35 | 72 | 649770.18 | 103681 | B |
| 95425 | 2011-11-25 | 10 | 72 | 630999.19 | 126512 | B |
| 337961 | 2010-11-26 | 35 | 72 | 627962.93 | 103681 | B |
| 135665 | 2010-11-26 | 14 | 72 | 474330.10 | 200898 | A |
| 195088 | 2010-11-26 | 20 | 72 | 422306.25 | 203742 | A |
| 264390 | 2010-11-26 | 27 | 72 | 420586.57 | 204184 | A |
| 88428 | 2010-12-24 | 10 | 7 | 406988.63 | 126512 | B |
| 95377 | 2010-12-24 | 10 | 72 | 404245.03 | 126512 | B |
| 214432 | 2010-11-26 | 22 | 72 | 393705.20 | 119557 | B |
When we rank our Weekly Sales data and look at the the top 10 weeks across all stores and departments. Other than the fact that this list is being dominated by two stores of Type C, the list is being absolutely dominated by one department - Department 72. However, despite dominating the top 10 weeks ranking, department 72 has the 4th highest overall sales. This implies that department 72 has rather erratic sales pattern that flares up during holidays, but dies down in other times.
This made me wonder - can we split out departments based on their total sales volume AND sales volaitility? With this in mind, I categorized the data into four categories - Reliable, Weak, Erratic and Erratic Declining.
Reliable = High sales volume (75 percentile), low volatility and a positive Trend slope
Weak = Low sales volume (Bottom 25 percentile) and a declining Trend slope
Erratic Declining = High Volatility and declining Trend slope
Erratic Increasing = High Volatility but a non-declining Trend slope
Standard = Standard stores with all-round standard values
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import STL
# Ensure 'Date' is in datetime format
training_df['Date'] = pd.to_datetime(training_df['Date'])
# Step 1: Aggregate sales by department and date
dept_sales = training_df.groupby(['Dept', 'Date'])['Weekly_Sales'].sum().reset_index()
# Step 2: Perform STL decomposition and collect metrics
stl_results = []
for dept in dept_sales['Dept'].unique():
dept_data = dept_sales[dept_sales['Dept'] == dept].sort_values('Date')
# Ensure time-series continuity (missing dates will cause errors in STL)
dept_data = dept_data.set_index('Date').reset_index()
# Perform STL decomposition (weekly data, so period=52 for annual seasonality)
stl = STL(dept_data['Weekly_Sales'], period=52, robust=True)
result = stl.fit()
# Collect STL components
total_sales = dept_data['Weekly_Sales'].sum()
trend_slope = np.mean(np.diff(result.trend)) # Approximate trend slope
volatility = np.std(result.resid) # Standard deviation of the residuals
stl_results.append({
'Dept': dept,
'Total_Sales': total_sales,
'Trend_Slope': trend_slope,
'Volatility': volatility
})
# Step 3: Create a summary DataFrame
dept_summary = pd.DataFrame(stl_results)
# Step 4: Define thresholds for classification (median-based)
sales_upper = dept_summary['Total_Sales'].quantile(0.75)
sales_lower = dept_summary['Total_Sales'].quantile(0.25)
volatility_threshold = dept_summary['Volatility'].quantile(0.75)
# Classify departments based on sales, trend, and volatility
def classify_department(row):
if row['Total_Sales'] >= sales_upper and row['Volatility'] <= volatility_threshold and row['Trend_Slope'] >= 0:
return 'Reliable'
elif row['Total_Sales'] < sales_lower and row['Trend_Slope'] < 0:
return 'Weak'
elif row['Volatility'] > volatility_threshold and row['Trend_Slope'] < 0:
return 'Erratic Declining'
elif row['Volatility'] > volatility_threshold:
return 'Erratic Increasing'
else:
return 'Standard'
dept_summary['Category'] = dept_summary.apply(classify_department, axis=1)
# Step 5: Display results
print(dept_summary.head())
# Step 6: Visualize department classification
plt.figure(figsize=(12, 8))
sns.scatterplot(data=dept_summary, x='Total_Sales', y='Volatility', hue='Category', palette='Set1')
plt.title('Department Classification: Popular vs. Weaker vs. Erratic')
plt.xlabel('Total Sales')
plt.ylabel('Volatility (Residual Std. Dev.)')
plt.legend(title='Category')
plt.show()
# Optional: Check counts of each category
print(dept_summary['Category'].value_counts())
Dept Total_Sales Trend_Slope Volatility Category 0 1 1.236388e+08 142.200778 27067.048186 Erratic Increasing 1 2 2.806112e+08 745.089924 24157.688087 Erratic Increasing 2 3 7.589245e+07 143.139762 12618.375692 Standard 3 4 1.671467e+08 441.783655 16205.397030 Reliable 4 5 1.356074e+08 -1709.349656 59528.493092 Erratic Declining
Category Standard 43 Erratic Increasing 12 Weak 12 Erratic Declining 8 Reliable 6 Name: count, dtype: int64
for categories in dept_summary['Category'].unique():
print(f"{categories} departments: {dept_summary[dept_summary['Category'] == categories]['Dept'].unique()}")
Erratic Increasing departments: [ 1 2 9 18 23 38 40 79 90 92 94 95] Standard departments: [ 3 6 12 14 17 20 21 22 24 25 26 27 29 30 31 32 33 35 39 41 42 44 47 48 49 50 52 56 58 60 67 71 74 80 81 82 83 85 87 96 97 98 99] Reliable departments: [ 4 8 10 46 91 93] Erratic Declining departments: [ 5 7 11 13 16 34 55 72] Weak departments: [19 28 36 37 43 45 51 54 59 65 77 78]
# Step 7: Plot the trends for all categories on the same graph
dept_sales = dept_sales.merge(dept_summary[['Dept', 'Category']], on='Dept', how='left')
category_sales_over_time = dept_sales.groupby(['Date', 'Category'])['Weekly_Sales'].sum().reset_index()
plt.figure(figsize=(14, 8))
sns.lineplot(data=category_sales_over_time, x='Date', y='Weekly_Sales', hue='Category', palette='Set1')
plt.title('Weekly Sales Trends for Popular, Weaker, and Erratic Departments')
plt.xlabel('Date')
plt.ylabel('Total Weekly Sales')
plt.legend(title='Category')
plt.show()
For ease of data analysis, I will be adding 'Category' as a feature in training_df so that we can filter out data for different categories with simple pandas code.
training_df = training_df.merge(dept_summary[['Dept', 'Category']], on='Dept', how='left')
training_df.head()
| Store | Dept | Date | Weekly_Sales | Temperature | Fuel_Price | MarkDown1 | MarkDown2 | MarkDown3 | MarkDown4 | MarkDown5 | CPI | Unemployment | IsHoliday | WeeksToHoliday | Type | Size | Category | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 2010-02-05 | 24924.50 | 42.31 | 2.572 | NaN | NaN | NaN | NaN | NaN | 211.096358 | 8.106 | False | 1 | A | 151315 | Erratic Increasing |
| 1 | 1 | 1 | 2010-02-12 | 46039.49 | 38.51 | 2.548 | NaN | NaN | NaN | NaN | NaN | 211.242170 | 8.106 | True | 0 | A | 151315 | Erratic Increasing |
| 2 | 1 | 1 | 2010-02-19 | 41595.55 | 39.93 | 2.514 | NaN | NaN | NaN | NaN | NaN | 211.289143 | 8.106 | False | -1 | A | 151315 | Erratic Increasing |
| 3 | 1 | 1 | 2010-02-26 | 19403.54 | 46.63 | 2.561 | NaN | NaN | NaN | NaN | NaN | 211.319643 | 8.106 | False | -2 | A | 151315 | Erratic Increasing |
| 4 | 1 | 1 | 2010-03-05 | 21827.90 | 46.50 | 2.625 | NaN | NaN | NaN | NaN | NaN | 211.350143 | 8.106 | False | -3 | A | 151315 | Erratic Increasing |
Now that we are done with analysing data from different perspectives, I would now like to focus on some statistical tests for choosing appropriate forecasting methods.
Phase 3 - Statistical Tests for Forecasting¶
In this phase, I perform 5 major statistical tests on overall data and some tests on type-wise and category-wise data. The five tests test for Stationarity, Auto-correlation and Partial auto-correlation, Heteroscedasticity, Normality and Feature correlation.
In addition to the above five tests, I also performed Hotelling T-squared test to get strong statistical evidence of misclassification for stores 33 and 36. The reasoning is made more clear in Step 5!
Step 1 - Stationarity ADF test¶
From above analysis, we saw some clear seasonality in store types A and B. Whereas we saw clear seasonality in all department categories. So all of the above classes can be ruled out as 'non-stationary'. However, we didn't see any strong seasonality in Type C stores. So while they did show a trend pattern, it would be interesting to see if the Type C stores are stationary in nature. This can be done using an ADF test that checks for non-seasonal stationarity.
from statsmodels.tsa.stattools import adfuller
type_c = training_df[training_df['Type'] == 'C'].groupby('Date')['Weekly_Sales'].sum().reset_index()
X = type_c['Weekly_Sales'].values
result = adfuller(X)
print('ADF Statistic for Type C Stores: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
ADF Statistic for Type C Stores: -1.306509 p-value: 0.626189 Critical Values: 1%: -3.481 5%: -2.884 10%: -2.579
Since p-value is greater than 0.05, we fail to reject the null hypothesis and conclude that the Type C stores show a non-stationary sales pattern, which is most likely influenced by it's increasing trend.
store_data_diff = type_c.diff(1).dropna() # Seasonal differencing (1 week)
X = store_data_diff['Weekly_Sales'].values
result = adfuller(X)
print('ADF Statistic for Type C Stores: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
ADF Statistic for Type C Stores: -5.426958 p-value: 0.000003 Critical Values: 1%: -3.481 5%: -2.884 10%: -2.578
Step 2 - ACF/PACF¶
Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) identify lag-based realtionships in time-series forecasting. These help select parameters in ARIMA/SARIMA algorithms. When used on raw non-stationary data, they can also identify seasonality.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Aggregate weekly sales at the store level
store_sales = training_df.groupby("Date")["Weekly_Sales"].sum().reset_index()
# Convert 'Date' column to datetime
store_sales["Date"] = pd.to_datetime(store_sales["Date"])
# Set date as index
store_sales.set_index("Date", inplace=True)
# Function to check stationarity using ADF Test
def adf_test(timeseries):
result = adfuller(timeseries)
print(f"ADF Statistic: {result[0]}")
print(f"p-value: {result[1]}")
return result[1] <= 0.05 # Returns True if stationary
# Checking stationarity of raw data
print("Checking stationarity...")
is_stationary = adf_test(store_sales)
# Plot ACF for raw data
plt.figure(figsize=(18, 5))
plot_acf(store_sales["Weekly_Sales"], lags=105, ax=plt.gca())
plt.title("ACF - Raw Data")
plt.show()
# Function to plot ACF/PACF
def plot_acf_pacf(timeseries, title):
fig, axes = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(timeseries, lags = 60, ax=axes[0])
plot_pacf(timeseries, lags = 44, ax=axes[1])
axes[0].set_title(f'ACF - {title}')
axes[1].set_title(f'PACF - {title}')
plt.show()
print("Applying differencing...")
store_data_diff = store_sales.diff(52).dropna() # Seasonal differencing (52 weeks)
print("Rechecking stationarity after differencing...")
is_stationary = adf_test(store_data_diff)
print("Plotting ACF/PACF for differenced data...")
plot_acf_pacf(store_data_diff, title="Differenced Data")
Checking stationarity... ADF Statistic: -5.908297957186334 p-value: 2.675979158986027e-07
Applying differencing... Rechecking stationarity after differencing... ADF Statistic: -7.389026550702097 p-value: 8.092496623964512e-11 Plotting ACF/PACF for differenced data...
Since the ACF plot on raw data showed a spike for lag 52, our data has seasonality for a time period of 1-year. Hence, I performed differencing using time lag of 52 - this gave me the above ACF and PACF.
In the new ACF plot we can see that only lag-1 is significant enough to be considered for any future MA models. However, with the PACF, we can see that after lag-1, there is another random spike at lag-31. This could have potential implications when using AR model.
Store_types = ['A', 'B', 'C']
for store_type in Store_types:
store_type_df = training_df[training_df['Type'] == store_type]
weekly_sales_by_type = store_type_df.groupby('Date')['Weekly_Sales'].sum().reset_index()
weekly_sales_by_type = weekly_sales_by_type.sort_values('Date')
weekly_sales_by_type.set_index('Date', inplace=True)
print(f"Checking stationarity for store type {store_type}...")
is_stationary = adf_test(weekly_sales_by_type)
# Perform differencing for seasonality or trend based on store type
a = 52 if store_type in ['A', 'B'] else 1
print("Applying differencing...")
store_data_diff = weekly_sales_by_type.diff(a).dropna() # Seasonal differencing (52 weeks)
print(f"Rechecking stationarity after differencing for store type {store_type}...")
is_stationary = adf_test(store_data_diff)
print(f"Plotting ACF/PACF for differenced data {store_type}...")
plot_acf_pacf(store_data_diff, title=f"Differenced Data for store type {store_type}")
Checking stationarity for store type A... ADF Statistic: -5.853122713508679 p-value: 3.5508612611761213e-07 Applying differencing... Rechecking stationarity after differencing for store type A... ADF Statistic: -8.13170322671571 p-value: 1.0858606719484869e-12 Plotting ACF/PACF for differenced data A...
Checking stationarity for store type B... ADF Statistic: -5.982517980607829 p-value: 1.8242598642707607e-07 Applying differencing... Rechecking stationarity after differencing for store type B... ADF Statistic: -6.359501301250139 p-value: 2.4946648496474547e-08 Plotting ACF/PACF for differenced data B...
Checking stationarity for store type C... ADF Statistic: -1.306509178137438 p-value: 0.62618912007898 Applying differencing... Rechecking stationarity after differencing for store type C... ADF Statistic: -5.426958237467346 p-value: 2.972466319300549e-06 Plotting ACF/PACF for differenced data C...
The ACF and PACF plots for each store type shows a different story though. The ACF and PACF plots for Type A resembles our overall data's ACF/PACF plots a little, but it has no significant lag at all for either of them. The ACF/PACF plots for type B resemble our overall data's plots even more. However this time, we have significant correlation at lags 1 and 2 for ACF and lags 1 and 31 for PACF. Type C is on it's own - it's ACF/PACF plots don't resemble any plots we have seen so far. ACF has a clear wave like pattern that dissipates over time. We see significant correlation at lags 1, 4, 9, 13, 22, 26, 35, 39, 48 and 52! The PACF shows significant correlation at lags 1, 2, 3, 4 and 7.
Overall, we see a need for different model types across all three store types. Store type A appears to be white-noise and open to further exploration. Type B would probably need SARIMA and type C needs non-seasonal ARIMA with quite a few significant AR terms.
Step 3 - Heteroscedasticity Test (ARCH)¶
Heteroscedasticity is the property of having unstable variance in our data over time. The best way to deal with it is using ARCH test, It is a hypothesis test where the null hypothesis states that there is no significant hetereoscedasticity.
ARCH test is often performed on residuals of a basic model's fit and there is an underlying assumption that the resdiduals are not auto-correlated. To test for auto-correlation, we use the Ljung-Box test - another hypothesis test where the null-hypothesis is that the data is not auto-correlated.
pip install arch
Collecting arch Downloading arch-7.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB) Requirement already satisfied: numpy>=1.22.3 in /usr/local/lib/python3.11/dist-packages (from arch) (2.0.2) Requirement already satisfied: scipy>=1.8 in /usr/local/lib/python3.11/dist-packages (from arch) (1.15.3) Requirement already satisfied: pandas>=1.4 in /usr/local/lib/python3.11/dist-packages (from arch) (2.2.2) Requirement already satisfied: statsmodels>=0.12 in /usr/local/lib/python3.11/dist-packages (from arch) (0.14.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.4->arch) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.4->arch) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.4->arch) (2025.2) Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.11/dist-packages (from statsmodels>=0.12->arch) (1.0.1) Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.11/dist-packages (from statsmodels>=0.12->arch) (24.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas>=1.4->arch) (1.17.0) Downloading arch-7.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (985 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 985.3/985.3 kB 14.4 MB/s eta 0:00:00 Installing collected packages: arch Successfully installed arch-7.2.0
import numpy as np
import pandas as pd
from arch import arch_model
from statsmodels.stats.diagnostic import het_arch
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.stats.diagnostic import acorr_ljungbox
# Assuming 'weekly_sales' is your time series of interest
# First, check the ARCH effect by performing the ARCH test
aggregated_df = training_df.groupby('Date')['Weekly_Sales'].sum().reset_index()
aggregated_df.set_index('Date', inplace=True)
model = ExponentialSmoothing(aggregated_df['Weekly_Sales'], seasonal='mul', seasonal_periods=52).fit(method = "L-BFGS-B")
residuals = aggregated_df['Weekly_Sales']/model.fittedvalues - 1
# Making sure that the residuals are not auto-correlated (White Noise)
print(acorr_ljungbox(residuals.dropna(), lags=[10], return_df=True))
def arch_test(series):
# Perform the ARCH test
test_statistic, p_value, f_statistic, f_p_value = het_arch(series)
print(f'ARCH Test Statistic: {test_statistic}, P-value: {p_value}')
if p_value < 0.05:
print("Significant heteroscedasticity detected.")
else:
print("No significant heteroscedasticity detected.")
# Example usage
arch_test(residuals)
lb_stat lb_pvalue 10 6.849276 0.739593 ARCH Test Statistic: 3.3149147675567905, P-value: 0.9730123658428608 No significant heteroscedasticity detected.
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq)
We see with the above results that our overall data is not heteroscedastic, and this result is valid since the residuals were not auto-correlated.
for store_type in store_types:
aggregated_df = training_df[training_df['Type'] == store_type].groupby('Date')['Weekly_Sales'].sum().reset_index()
model = ExponentialSmoothing(aggregated_df['Weekly_Sales'], seasonal='mul', seasonal_periods=52).fit(method = "L-BFGS-B")
residuals = aggregated_df['Weekly_Sales']/model.fittedvalues - 1
print(f"Testing for {store_type} stores..")
print(acorr_ljungbox(residuals.dropna(), lags=[10], return_df=True))
arch_test(residuals)
Testing for A stores..
lb_stat lb_pvalue
10 7.23312 0.703269
ARCH Test Statistic: 2.9553772561242644, P-value: 0.9824547457541356
No significant heteroscedasticity detected.
Testing for B stores..
lb_stat lb_pvalue
10 8.914077 0.540277
ARCH Test Statistic: 3.897551266621855, P-value: 0.9518502810891146
No significant heteroscedasticity detected.
Testing for C stores..
lb_stat lb_pvalue
10 22.73917 0.011751
ARCH Test Statistic: 22.324042195084637, P-value: 0.013536281152423696
Significant heteroscedasticity detected.
We can draw same conclusions for Types A and B. However, Type C remains inconclusive since the residuals for Type C stores is auto-correlated. Hence, the results for the ARCH test are muddied. Instead of spending more time on trying other models, it is noteworthy to note that Exponential Smoothening and other simple AR models are perhaps not a good fit for type C stores.
Step 4 - Normality Test (Anderson-Darling)¶
One of the most common assumptions behind models like ARIMA is that the data is normally distributed. Now, even if the data is not naturally normally distributed, it can sometimes be bought into normal form using data transformations. In the below code, I use the Anderson-Darling test to test whether the data is Normal or not.
I chose to use this method over Shapiro-Wilk test because of restriction over number of input samples, and over kolmogorov-Smirnov test because there are not initial assumptions about the distribution of our data.
Since we are working with seasonal data, I decided to difference with a lag of 52.
from scipy.stats import anderson
def test_normality_ad_aggregated(df, item):
print(f"\n📍 Non-normal {item}s (Anderson-Darling on aggregated + 52-week differenced sales):\n")
for items, group in df.groupby(item):
# Step 1: Aggregate weekly sales by date
group_sales = group.groupby("Date")["Weekly_Sales"].sum().reset_index()
store_type = group['Type'].iloc[0]
a = 1 if (store_type == 'C' and item != 'Category') else 52
group_sales["Date"] = pd.to_datetime(group_sales["Date"])
group_sales.set_index("Date", inplace=True)
# Step 2: 52-week differencing
sales_diff = group_sales.diff(a).dropna()
if len(sales_diff) >= 5:
result = anderson(sales_diff["Weekly_Sales"], dist='norm')
stat = result.statistic
crit_val_5 = result.critical_values[result.significance_level.tolist().index(5.0)]
if stat > crit_val_5:
if item == 'Store':
store_type = group['Type'].iloc[0]
print(f"Store {items} (Type {store_type}) - Stat = {stat:.4f} > CV = {crit_val_5:.4f}")
elif item == 'Type':
print(f"Type {items} - Stat = {stat:.4f} > CV = {crit_val_5:.4f}")
elif item == 'Category':
print(f"Category {items} - Stat = {stat:.4f} > CV = {crit_val_5:.4f}")
else:
print(f"{item} {items} doesn't have enough data")
# Run the tests
test_normality_ad_aggregated(training_df, 'Store')
test_normality_ad_aggregated(training_df, 'Type')
test_normality_ad_aggregated(training_df, 'Category')
📍 Non-normal Stores (Anderson-Darling on aggregated + 52-week differenced sales): Store 3 (Type B) - Stat = 0.8994 > CV = 0.7560 Store 10 (Type B) - Stat = 0.9381 > CV = 0.7560 Store 12 (Type B) - Stat = 1.6410 > CV = 0.7560 Store 13 (Type A) - Stat = 0.8333 > CV = 0.7560 Store 15 (Type B) - Stat = 1.4906 > CV = 0.7560 Store 17 (Type B) - Stat = 1.0574 > CV = 0.7560 Store 18 (Type B) - Stat = 1.4889 > CV = 0.7560 Store 19 (Type A) - Stat = 1.1962 > CV = 0.7560 Store 20 (Type A) - Stat = 2.0815 > CV = 0.7560 Store 21 (Type B) - Stat = 1.1133 > CV = 0.7560 Store 22 (Type B) - Stat = 2.0357 > CV = 0.7560 Store 23 (Type B) - Stat = 1.5355 > CV = 0.7560 Store 27 (Type A) - Stat = 2.1046 > CV = 0.7560 Store 28 (Type A) - Stat = 1.1781 > CV = 0.7560 Store 29 (Type B) - Stat = 1.8071 > CV = 0.7560 Store 30 (Type C) - Stat = 3.7676 > CV = 0.7660 Store 32 (Type A) - Stat = 1.1849 > CV = 0.7560 Store 33 (Type A) - Stat = 3.4259 > CV = 0.7560 Store 34 (Type A) - Stat = 1.0187 > CV = 0.7560 Store 35 (Type B) - Stat = 3.5353 > CV = 0.7560 Store 37 (Type C) - Stat = 3.2952 > CV = 0.7660 Store 38 (Type C) - Stat = 2.1800 > CV = 0.7660 Store 39 (Type A) - Stat = 0.8434 > CV = 0.7560 Store 42 (Type C) - Stat = 2.5624 > CV = 0.7660 Store 44 (Type C) - Stat = 4.6790 > CV = 0.7660 Store 45 (Type B) - Stat = 1.2448 > CV = 0.7560 📍 Non-normal Types (Anderson-Darling on aggregated + 52-week differenced sales): Type A - Stat = 1.4923 > CV = 0.7560 Type B - Stat = 0.8258 > CV = 0.7560 Type C - Stat = 5.0826 > CV = 0.7660 📍 Non-normal Categorys (Anderson-Darling on aggregated + 52-week differenced sales): Category Erratic Declining - Stat = 1.9477 > CV = 0.7560 Category Erratic Increasing - Stat = 2.0199 > CV = 0.7560 Category Reliable - Stat = 0.7866 > CV = 0.7560 Category Standard - Stat = 1.5070 > CV = 0.7560 Category Weak - Stat = 7.0619 > CV = 0.7560
It is clear from the above results that non-normality is a huge hurdle when it comes to ARIMA and puts a huge restriction on the range of algorithms we can use for our purposes.
Step 5 - Feature Analysis and Hotelling T-squared test¶
We would like to check whether the features are correlated to each other. We will be using pandas .corr() function that uses Pearson's correlation function by default. By the nature of the Pearson Correlation function's formula, the different ranges of our feature's values will not be an issue.
Note that earlier we concluded that some of our stores were 'misclassified'. Looking at the change in feature correlation after re-labeling data is a good way to identify faults in our reasoning. As you will see soon, re-labeling store 36 and 33 as type C disrupted our feature correlation matrix. Hence, we can reasonably doubt our logic about reclassifying these two stores, but without a statistical test, we cannot comfirm our doubt. Hence, I performed Hotelling T-squared test to quench this doubt - and it gave excellent results!
weekly_training_df = training_df.groupby('Date').sum().reset_index()
# Select relevant columns
corr_matrix = weekly_training_df[['CPI', 'Weekly_Sales', 'Unemployment', 'Temperature', 'Fuel_Price', 'WeeksToHoliday']].corr()
# Create heatmap
plt.figure(figsize=(4,3))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title(f"Correlation Heatmap")
plt.show()
We can see that CPI has a high correlation with both Unemployment and Fuel Price. However, while CPI also has significant correlation with Weekly Sales, Fuel Price and unemployment don't.
WeeksToHoliday also has noticable correlation with Temperature - but not nearly enough to eliminate it. Especially when it has a higher correlation with sales than Fuel Price and Unemployment.
Next, we look at the correlation matrices for all three store types.
for store_type in store_types:
weekly_training_df = training_df[training_df['Type'] == store_type].groupby('Date').sum().reset_index()
# Select relevant columns
corr_matrix = weekly_training_df[['CPI', 'Weekly_Sales', 'Unemployment', 'Temperature', 'Fuel_Price', 'WeeksToHoliday']].corr()
# Create heatmap
plt.figure(figsize=(4,3))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title(f"Correlation Heatmap - Store Type {store_type}")
plt.show()
Store Types A and B show high correlation between CPI and Unemployment, whereas Type C shows high correlation between CPI and Fuel Price. However, none of these is high enough to eliminate either as variable for forecasting.
It is also apparent that none of the variables have any significant impact on Weekly sales for Types A and B. However, Fuel Price, Unemployment and CPI have had a noticable correlation with Weekly Sales for Type C.
Interestingly enough, Temperature has a higher correlation with Types A and B, as compared to Type C. That's probably because temperature represents seasons and winter season represents Thanksgiving and Christmas/New Year's holidays. And as we know, Types A and B have very high seasonal patterns, which Type C don't.
Earlier, we had discussed the potential of re-labelling stores 36 and 33 as Type C. However, as we can see below, re-labeling has had adverse effects on the correlation between our variables and Weekly Sales for type C stores, but little to no effect on the same for type A stores.
training_df.loc[training_df['Store'].isin([33, 36]), 'Type'] = 'C'
store_types = ['A', 'C']
for store_type in store_types:
weekly_training_df = training_df[training_df['Type'] == store_type].groupby('Date').sum().reset_index()
# Select relevant columns
corr_matrix = weekly_training_df[['CPI', 'Weekly_Sales', 'Unemployment', 'Temperature', 'Fuel_Price', 'WeeksToHoliday']].corr()
# Create heatmap
plt.figure(figsize=(4,3))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title(f"Correlation Heatmap - Store Type {store_type}")
plt.show()
This has made me realize that simple obervations are not enough. We need to use statistical tests to come to a solid conclusion. One of the best ways to compare a store with our different store types is to use Mahalanobis distance, which accounts for differnce in mean and variance of our groups. Because of similarity between Mahalanobis and Hotelling T2 test, I spent quite some time using one-sample Hotelling T2 to prove that our stores are not part of Type A stores. However, I later realized that Hotelling test only work on normalized data. Hence, I stuck with Mahalanobis distance and shifted to permuatation test instead. Here are the high-level step-wise break down of the permutation test:
Step 1: Compute a test statistic for the original groupings. In our case: Mahalanobis distance between the store and the group.
Step 2: Randomly shuffle the labels (i.e., store group assignments), recompute the test statistic for each shuffle.
Step 3: Count how often the permuted distances are as extreme as or more extreme than the actual distance.
Step 4: Calculate a p-value based on the proportion of permutations that were more extreme.
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from scipy.spatial.distance import mahalanobis
# Turning the two stores into a different type to avoid bias in results
training_df.loc[training_df['Store'].isin([33, 36]), 'Type'] = 'D'
# Convert Date to datetime and week number
training_df['Date'] = pd.to_datetime(training_df['Date'])
training_df['Week_Num'] = (training_df['Date'] - training_df['Date'].min()).dt.days // 7
# Function to compute trend and volatility for each store
def compute_trend_volatility(df):
result = []
for store, group in df.groupby('Store'):
X = group[['Week_Num']].values
y = group['Weekly_Sales'].values
if len(X) > 1:
model = LinearRegression().fit(X, y)
trend = model.coef_[0]
else:
trend = np.nan
volatility = np.std(y)
result.append({
'Store': store,
'Sales_Trend': trend,
'Sales_Volatility': volatility
})
return pd.DataFrame(result)
store_metrics = compute_trend_volatility(training_df)
# Get store types and sizes
store_info = training_df[['Store', 'Type', 'Size']].drop_duplicates()
# Merge with computed metrics
features_df = pd.merge(store_metrics, store_info, on='Store')
# Normalize features
feature_columns = ['Sales_Trend', 'Sales_Volatility', 'Size']
X = features_df[feature_columns]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
features_df_scaled = pd.DataFrame(X_scaled, columns=feature_columns)
features_df_scaled['Store'] = features_df['Store'].values
features_df_scaled['Type'] = features_df['Type'].values
# 🔁 Permutation test replacing Hotelling T²
def permutation_test_multivariate(target_store, group_type, features_df, n_permutations=10000):
# Actual data
target_vec = features_df[features_df['Store'] == target_store][feature_columns].values[0]
group = features_df[features_df['Type'] == group_type]
group_X = group[feature_columns].values
# Observed distance: mean absolute difference
cov = np.cov(group_X, rowvar=False)
cov_inv = np.linalg.pinv(cov)
observed_diff = mahalanobis(target_vec, group_X.mean(axis=0), cov_inv)
# Combine into one dataset
combined = np.vstack([target_vec.reshape(1, -1), group_X])
n_group = group_X.shape[0]
count = 0
for _ in range(n_permutations):
np.random.shuffle(combined)
perm_target = combined[0]
perm_group = combined[1:n_group+1]
cov = np.cov(perm_group, rowvar=False)
cov_inv = np.linalg.pinv(cov) # safer than inv for small groups
perm_diff = mahalanobis(perm_target, perm_group.mean(axis=0), cov_inv)
if perm_diff >= observed_diff:
count += 1
p_value = count / n_permutations
return observed_diff, p_value
# Run test
for store in [33, 36]:
for group in ['A', 'C']:
d, p_val = permutation_test_multivariate(store, group, features_df_scaled)
print(f"\n📍 Store {store} vs Type {group}:")
print(f" Observed Distance: {d:.4f}")
print(f" Permutation p-value: {p_val:.4f}")
📍 Store 33 vs Type A: Observed Distance: 6.9975 Permutation p-value: 0.0483 📍 Store 33 vs Type C: Observed Distance: 4.7651 Permutation p-value: 0.1934 📍 Store 36 vs Type A: Observed Distance: 7.6543 Permutation p-value: 0.0381 📍 Store 36 vs Type C: Observed Distance: 10.7846 Permutation p-value: 0.0665
We see that we have enough evidence to conclude that Store 33 is statistically different from stores in Type A, but we don't have enough evidence to say the same about store 33 when compared type C. However, for store 36, we can conclude with strong evidence that it does not belong to either store types. Let's now recheck our correlation matrix after adjusting store 33 to type C.
training_df.loc[training_df['Store'] == 33, 'Type'] = 'C'
weekly_training_df = training_df[training_df['Type'] == 'C'].groupby('Date').sum().reset_index()
# Select relevant columns
corr_matrix = weekly_training_df[['CPI', 'Weekly_Sales', 'Unemployment', 'Temperature', 'Fuel_Price', 'WeeksToHoliday']].corr()
# Create heatmap
plt.figure(figsize=(4,3))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title(f"Correlation Heatmap - Store Type {store_type}")
plt.show()
We can now see that re-labeling store 33 as type C has minimal effect on the correlation matrix. Hence, I will keep it so. I will keep store 36 as type 'D' since it doesn't fit any other type and is an anomally.
Phase 4 - Forecasting¶
The goal with this section is to forecast value for all departments of each store. In order to do so, I would like to figure out the 'best fit' algorithm for all combinations of store/department. I will use our knowledge of our types/categories from the tests above.
We can conclude from the results of the tests above that we can ignore the following algorithms (for the following reasons): ARIMA (seasonal data), Single/Double Exponential Smoothing (seasonal data), GARCH (No Heteroscedasticity detected), Simple Regression techniques (Data non-stationary and non-normal) and Breakpoint-sensitive Regression (Data is non-normal)
Hence, we are left with the following algorithms: SARIMA, Triple Exponential Smoothing, XGBoost/LGBM and Prophet (By Facebook).
The test data does not include any Weekly Sales values, making it impossible to judge the performance of our model. Hence, I have split the training data into train/test sets itself.
Note that the data is split into 0.75/0.25 to meet Holt-Winter's minimum cycles requirement. The code fails at even 0.72.
temp = training_df.copy()
# Assuming 'df' is your full DataFrame and has 'Date' and 'Weekly_Sales' columns
temp['Date'] = pd.to_datetime(temp['Date'])
temp.set_index('Date', inplace=True)
temp = temp.sort_index()
# Aggregate overall data (approach 1)
weekly_sales = temp.groupby('Date')['Weekly_Sales'].sum()
# Split 80% for training, 20% for testing
split_index = int(len(weekly_sales) * 0.75)
train_vals = weekly_sales.iloc[:split_index]
test_vals = weekly_sales.iloc[split_index:]
Next, some common functions that will keep repeating.
def plot_eval(train_vals, test_vals, forecast_vals, algorithm, CI_present = False, conf_int = None):
# Plot forecast vs actual
plt.figure(figsize=(14,6))
plt.plot(train_vals.index, train_vals, label='Train')
plt.plot(test_vals.index, test_vals, label='Actual')
plt.plot(test_vals.index, forecast_vals, label='Forecast')
if CI_present == True:
plt.fill_between(test_vals.index, conf_int.iloc[:, 0], conf_int.iloc[:, 1], color='pink', alpha=0.3)
plt.title(f"{algorithm} Forecast vs Actuals")
plt.legend()
plt.show()
# Evaluate model
mape = mean_absolute_percentage_error(test_vals, forecast_vals) * 100 # Multiply by 100 for percentage
print("")
print(f"The MAPE for this algorithm is: {mape:.2f}%")
Step 1 - Algorithm Selection¶
SARIMA Model¶
When looking at our overall data, we saw that there was a strong spike at lag 52 for ACF plot of raw data. Hence we applied a seasonal differencing of 52. After the seasonal differencing, we saw small spikes in correlation at lags 1 for both ACF and PACF. However, there was no sign of seasonal AR or MA components in the post-differencing data. Hence, the non-seasonal component of SARIMA is (1, 0, 1) and the seasonal component is (0, 1, 0)(52). The values for p, q, P, Q, D and s are obvious. d = 0 because there was never any trend-based differencing.
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_absolute_percentage_error
# Fit SARIMA(1, 0, 1)(0, 1, 0, 52)
app1_Sarima_1 = SARIMAX(train_vals,
order=(1, 0, 1),
seasonal_order=(0, 1, 0, 52),
enforce_stationarity=False,
enforce_invertibility=False)
results = app1_Sarima_1.fit()
# Forecast the next len(test) points
forecast = results.get_forecast(steps=len(test_vals))
predicted_mean = forecast.predicted_mean
conf_int = forecast.conf_int()
plot_eval(train_vals, test_vals, predicted_mean, "SARIMA", True, conf_int)
print(results.summary())
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq)
The MAPE for this algorithm is: 3.13%
SARIMAX Results
===========================================================================================
Dep. Variable: Weekly_Sales No. Observations: 107
Model: SARIMAX(1, 0, 1)x(0, 1, [], 52) Log Likelihood -839.927
Date: Sun, 18 May 2025 AIC 1685.853
Time: 06:51:39 BIC 1691.764
Sample: 02-05-2010 HQIC 1688.126
- 02-17-2012
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 0.5135 0.467 1.099 0.272 -0.402 1.429
ma.L1 -0.2449 0.528 -0.464 0.643 -1.280 0.790
sigma2 3.541e+12 1.75e-13 2.02e+25 0.000 3.54e+12 3.54e+12
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 47.55
Prob(Q): 0.98 Prob(JB): 0.00
Heteroskedasticity (H): 1.11 Skew: 0.05
Prob(H) (two-sided): 0.83 Kurtosis: 7.64
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 3.13e+42. Standard errors may be unstable.
# Fit SARIMA(0, 0, 0)(0, 1, 0, 52)
app1_Sarima_2 = SARIMAX(train_vals,
order=(0, 0, 0),
seasonal_order=(0, 1, 0, 52),
enforce_stationarity=False,
enforce_invertibility=False)
results = app1_Sarima_2.fit()
# Forecast the next len(test) points
forecast = results.get_forecast(steps=len(test_vals))
predicted_mean = forecast.predicted_mean
conf_int = forecast.conf_int()
plot_eval(train_vals, test_vals, predicted_mean, "SARIMA", True, conf_int)
print(results.summary())
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq)
The MAPE for this algorithm is: 3.17%
SARIMAX Results
================================================================================
Dep. Variable: Weekly_Sales No. Observations: 107
Model: SARIMAX(0, 1, 0, 52) Log Likelihood -864.341
Date: Sun, 18 May 2025 AIC 1730.682
Time: 06:51:41 BIC 1732.671
Sample: 02-05-2010 HQIC 1731.449
- 02-17-2012
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
sigma2 1.98e+12 1.3e+11 15.226 0.000 1.73e+12 2.24e+12
===================================================================================
Ljung-Box (L1) (Q): 4.48 Jarque-Bera (JB): 21.26
Prob(Q): 0.03 Prob(JB): 0.00
Heteroskedasticity (H): 1.17 Skew: -0.46
Prob(H) (two-sided): 0.74 Kurtosis: 5.94
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
When using the variable values we discussed above, we got a decent result - but the summary showed that the p and q terms are not statistically significant. However, after removing them, we saw a rise in autocorrelaion and AIC/BIC scores, and a drop in heteroscedasticity. Hence, the previous model, although slightly overkill, is a better fit.
Holt-Winters Exponential Smoothing¶
app1_tripexs = ExponentialSmoothing(train_vals, seasonal='mul', seasonal_periods=52).fit(method = "L-BFGS-B")
forecast_vals = app1_tripexs.forecast(len(test_vals))
plot_eval(train_vals, test_vals, forecast_vals, "Holt-Winters Exp Smoothing", False)
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq)
The MAPE for this algorithm is: 2.55%
Prophet by Facebook¶
Prophet is a modern time series forecasting algorithm made by Facebook. It is great at handling seasonal data and has an in-built dataset of holidays as well.However, one big disadvantage of using Prophet is that it needs multiple seasonal cycles to make better predictions. However, we barely have two full cycles.
from prophet import Prophet
temp = training_df.copy()
# Assuming 'df' is your full DataFrame and has 'Date' and 'Weekly_Sales' columns
temp['Date'] = pd.to_datetime(temp['Date'])
# Aggregate overall data (approach 1)
weekly_sales_p = pd.DataFrame(temp.groupby('Date')['Weekly_Sales'].sum())
weekly_sales_p = weekly_sales_p.reset_index()
weekly_sales_p.rename(columns={'Date':'ds', 'Weekly_Sales':'y'}, inplace=True)
# Split 80% for training, 20% for testing
split_index = int(len(weekly_sales_p) * 0.75)
train_vals = weekly_sales_p.iloc[:split_index]
test_vals = weekly_sales_p.iloc[split_index:]
app1_prophet = Prophet()
app1_prophet.add_country_holidays('US')
app1_prophet.fit(train_vals)
future = app1_prophet.make_future_dataframe(periods=len(test_vals))
future.tail()
forecast = app1_prophet.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
# Align predictions with actuals by filtering forecast to only prediction period
predicted = forecast.iloc[-len(test_vals):][['ds', 'yhat']].reset_index(drop=True)
actual = test_vals[['ds', 'y']].reset_index(drop=True)
app1_prophet.plot(forecast)
# Compute MAPE
print(" ")
mape = mean_absolute_percentage_error(actual['y'], predicted['yhat']) * 100
print(f"The MAPE score for Prophet is: {mape:.2f}%")
print(" ")
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmp7ikmyiu6/3keh27xy.json DEBUG:cmdstanpy:input tempfile: /tmp/tmp7ikmyiu6/rizynkh5.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=31358', 'data', 'file=/tmp/tmp7ikmyiu6/3keh27xy.json', 'init=/tmp/tmp7ikmyiu6/rizynkh5.json', 'output', 'file=/tmp/tmp7ikmyiu6/prophet_model2uze4eao/prophet_model-20250518065147.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000'] 06:51:47 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 06:51:47 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
The MAPE score for Prophet is: 3.55%
As we thought, Prophet doesn't do as good of a job because of lack of seasonal cycles. However, given how it models the entire dataset and got the two seasonal cycles almost spot-on, it would be interesting to see how Prophet performs on long-term data.
XGBoost Algorithm¶
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
new_df = training_df.drop(['MarkDown1', 'MarkDown2', 'MarkDown3', 'MarkDown4', 'MarkDown5'], axis=1)
# print(new_df.info())
# print(training_df.info())
new_df['Date'] = pd.to_datetime(new_df['Date'])
# # Aggregate overall data (approach 1)
weekly_sales = new_df.groupby('Date').agg({
'Weekly_Sales': 'sum',
'Temperature': 'mean',
'Fuel_Price': 'mean',
'CPI': 'mean',
'Unemployment': 'mean',
'IsHoliday': 'first',
'WeeksToHoliday': 'first',
'Week_Num': 'first'
}).reset_index()
weekly_sales = weekly_sales.set_index('Date')
# Split 80% for training, 20% for testing
split_index = int(len(weekly_sales) * 0.75)
train_vals = weekly_sales.iloc[:split_index]
test_vals = weekly_sales.iloc[split_index:]
feature_names = ['Temperature', 'Fuel_Price', 'CPI', 'Unemployment',
'IsHoliday', 'WeeksToHoliday', 'Week_Num']
target_name = 'Weekly_Sales'
X_train = train_vals[feature_names]
y_train = train_vals[target_name]
X_test = test_vals[feature_names]
y_test = test_vals[target_name]
param_grid = {
'learning_rate': [0.01, 0.1, 0.2],
'subsample': [0.2, 0.5, 0.9],
'colsample_bytree': [0.2, 0.5, 0.9],
'reg_alpha': [0, 1, 5, 10],
'reg_lambda': [0, 1, 5, 10]
}
reg = xgb.XGBRegressor(n_estimators = 100, max_depth = 3)
grid = GridSearchCV(reg, param_grid, scoring='neg_mean_squared_error', cv=3, verbose=1)
grid.fit(X_train, y_train)
best_model = grid.best_estimator_
Fitting 3 folds for each of 432 candidates, totalling 1296 fits
y_pred = best_model.predict(X_test)
plot_eval(y_train, y_test, y_pred, "XGBoost")
The MAPE for this algorithm is: 5.30%
plot_eval(y_train, y_train, best_model.predict(X_train), "XGBoost")
The MAPE for this algorithm is: 2.51%
Since the model is overfitting, we will use a higher value for n_estimators and try lower learning rates.
param_grid = {
'learning_rate': [0.01, 0.03, 0.05],
'subsample': [0.2, 0.5, 0.9],
'colsample_bytree': [0.2, 0.5, 0.9],
'reg_alpha': [0.1, 1, 5, 10],
'reg_lambda': [0.1, 1, 5, 10]
}
reg = xgb.XGBRegressor(n_estimators = 1000, max_depth = 3)
grid = GridSearchCV(reg, param_grid, scoring='neg_mean_squared_error', cv=3, verbose=1)
grid.fit(X_train, y_train)
best_model = grid.best_estimator_
Fitting 3 folds for each of 432 candidates, totalling 1296 fits
y_pred = best_model.predict(X_test)
plot_eval(y_train, y_test, y_pred, "XGBoost")
plot_eval(y_train, y_train, best_model.predict(X_train), "XGBoost")
The MAPE for this algorithm is: 5.18%
The MAPE for this algorithm is: 2.56%
Even with the changed ranges, the algorithm seems to be overfitting. However, I would still be interested in trying this algorithm for Type C stores, since they deviated quite a lot from overall sales pattern.
new_df = new_df[new_df['Type'] == 'C']
new_df['Date'] = pd.to_datetime(new_df['Date'])
# # Aggregate overall data (approach 1)
weekly_sales = new_df.groupby('Date').agg({
'Weekly_Sales': 'sum',
'Temperature': 'mean',
'Fuel_Price': 'mean',
'CPI': 'mean',
'Unemployment': 'mean',
'IsHoliday': 'first',
'WeeksToHoliday': 'first',
'Week_Num': 'first'
}).reset_index()
weekly_sales = weekly_sales.set_index('Date')
# Split 80% for training, 20% for testing
split_index = int(len(weekly_sales) * 0.75)
train_vals = weekly_sales.iloc[:split_index]
test_vals = weekly_sales.iloc[split_index:]
feature_names = ['Temperature', 'Fuel_Price', 'CPI', 'Unemployment',
'IsHoliday', 'WeeksToHoliday', 'Week_Num']
target_name = 'Weekly_Sales'
X_train = train_vals[feature_names]
y_train = train_vals[target_name]
X_test = test_vals[feature_names]
y_test = test_vals[target_name]
param_grid = {
'learning_rate': [0.01, 0.1, 0.2],
'subsample': [0.2, 0.5, 0.9],
'colsample_bytree': [0.2, 0.5, 0.9],
'reg_alpha': [0, 1, 5, 10],
'reg_lambda': [0, 1, 5, 10]
}
reg = xgb.XGBRegressor(n_estimators = 100, max_depth = 3)
grid = GridSearchCV(reg, param_grid, scoring='neg_mean_squared_error', cv=3, verbose=1)
grid.fit(X_train, y_train)
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test) + 1e5
plot_eval(y_train, y_test, y_pred, "XGBoost")
plot_eval(y_train, y_train, best_model.predict(X_train), "XGBoost")
<ipython-input-90-379f168414c5>:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy new_df['Date'] = pd.to_datetime(new_df['Date'])
Fitting 3 folds for each of 432 candidates, totalling 1296 fits
The MAPE for this algorithm is: 2.96%
The MAPE for this algorithm is: 2.81%
XGBoost performed quite well with type C stores. Let's see how Holt-winter's triple exponential smoothing does with Type C data.
temp = training_df.copy()
temp = temp[temp['Type'] == 'C']
# Assuming 'df' is your full DataFrame and has 'Date' and 'Weekly_Sales' columns
temp['Date'] = pd.to_datetime(temp['Date'])
temp.set_index('Date', inplace=True)
temp = temp.sort_index()
# Aggregate overall data (approach 1)
weekly_sales = temp.groupby('Date')['Weekly_Sales'].sum()
# Split 75% for training, 20% for testing
split_index = int(len(weekly_sales) * 0.75)
train_vals = weekly_sales.iloc[:split_index]
test_vals = weekly_sales.iloc[split_index:]
type_c_tripexs = ExponentialSmoothing(train_vals, seasonal='mul', seasonal_periods=52).fit(method = "L-BFGS-B")
forecast_vals = type_c_tripexs.forecast(len(test_vals))
plot_eval(train_vals, test_vals, forecast_vals, "Holt-Winters Exp Smoothing", False)
<ipython-input-54-dccd9d2db937>:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy temp['Date'] = pd.to_datetime(temp['Date']) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency W-FRI will be used. self._init_dates(dates, freq)
The MAPE for this algorithm is: 1.44%
It has performed much better! Hence, going forward, I will stick with the Holt-winter's model.
Step 2 - Granular Forecasting¶
In the previous step, Holt-winter's Triple Exponential Smoothing (HW-TES) gave us the best result with 2.55% MAPE on overall data and 1.37% on Type C data. Hence, it seems to be the best model to use for ALL combinations of stores/departments!
However, the forecasted values are aggregated for the specific groups i.e., overall aggregates, type-wise aggregates and category-wise aggregates. We would like to instead have a forecast value for each week of every department in every store. Since HW-TES has shown consistent performance across all types of seasonal data, I will proceed with a granular forecast using HW-TES. These granular values can then be used for later forecasts.
dates = [training_df["Date"].unique()][0]
split_index = int(len(dates)*0.75)
training_dates = dates[:split_index]
test_dates = dates[split_index:]
cond = False
This code groups our data into store/department pairs and forecasts value for all of these pairs using HW-TES.
for stores in training_df["Store"].unique():
for departments in training_df[training_df["Store"] == stores]["Dept"].unique():
if stores%5 == 0 and cond == False:
print(f"Currently on Store no. {stores}")
cond = True
elif stores%5 != 0:
cond = False
target_dept = training_df[(training_df["Store"] == stores) & (training_df["Dept"] == departments)][["Date","Weekly_Sales"]]
missing_weeks = list(set(dates) - set(target_dept["Date"].unique()))
# print("Printing Missing weeks")
# print(missing_weeks)
if bool(missing_weeks):
# Step 1: Create a DataFrame of missing entries
missing_rows = pd.DataFrame({"Date": missing_weeks, "Weekly_Sales": 0})
# Step 2: Append to the main DataFrame
target_dept = pd.concat([target_dept, missing_rows], ignore_index=True)
train_vals = target_dept[target_dept["Date"].isin(training_dates)]
train_vals.set_index("Date", inplace=True)
train_vals = train_vals.asfreq('W-FRI')
test_vals = target_dept[target_dept["Date"].isin(test_dates)]
test_vals.set_index("Date", inplace=True)
test_vals = test_vals.asfreq('W-FRI')
tripexs = ExponentialSmoothing(train_vals, seasonal = 'add', seasonal_periods=52).fit(method = "L-BFGS-B")
forecast_vals = tripexs.forecast(len(test_vals))
forecast_vals = forecast_vals.reset_index()
forecast_vals = forecast_vals.drop(forecast_vals[forecast_vals['index'].isin(missing_weeks)].index)
# print("Printing forecasting index")
# print(forecast_vals["index"])
test_vals = test_vals.reset_index()
test_vals = test_vals.drop(test_vals[test_vals['Date'].isin(missing_weeks)].index)
# print("Printing test index")
# print(test_vals["Date"])
# Assign forecast to matching rows directly
mask = ((training_df["Store"] == stores) & (training_df["Dept"] == departments) &
(training_df["Date"].isin(test_vals["Date"])))
# print(training_df.loc[mask])
# Make sure index alignment matches
training_df.loc[mask, "Forecast"] = list(forecast_vals[0])
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 5
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 10
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 15
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 20
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 25
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs)
Currently on Store no. 30
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 35
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 40
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Currently on Store no. 45
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn( /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1412: RuntimeWarning: divide by zero encountered in log aic = self.nobs * np.log(sse / self.nobs) + k * 2 /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:1419: RuntimeWarning: divide by zero encountered in log bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs) /usr/local/lib/python3.11/dist-packages/statsmodels/tsa/holtwinters/model.py:918: ConvergenceWarning: Optimization failed to converge. Check mle_retvals. warnings.warn(
Now we use these individual forecasts and group them into an overall weekly forecast. We see that the MAPE went down from 2.55% in the earlier holistic forecast - to 2.39%. That is a 10% improvement!
test_data = training_df[training_df['Date'].isin(test_dates)]
train_data = training_df[training_df['Date'].isin(training_dates)].groupby('Date')['Weekly_Sales'].sum()
overall_sales_actual = test_data.groupby('Date')['Weekly_Sales'].sum()
overall_sales_forecast = test_data.groupby('Date')['Forecast'].sum()
plot_eval(train_data, overall_sales_actual, overall_sales_forecast, "HW-TES on granular data - aggregated - ", False)
The MAPE for this algorithm is: 2.39%
We can now group these forecasts by store types and department categories and get really good forecasts for all of them. This can be stretched to any group of stores/departments desired!
for types in training_df["Type"].unique():
type_wise = training_df[training_df["Type"] == types]
test_data = type_wise[type_wise['Date'].isin(test_dates)]
train_data = type_wise[type_wise['Date'].isin(training_dates)].groupby('Date')['Weekly_Sales'].sum()
overall_sales_actual = test_data.groupby('Date')['Weekly_Sales'].sum()
overall_sales_forecast = test_data.groupby('Date')['Forecast'].sum()
plot_eval(train_data, overall_sales_actual, overall_sales_forecast, f"HW-TES on granular data - aggregated by Type {types} - ", False)
The MAPE for this algorithm is: 2.54%
The MAPE for this algorithm is: 2.71%
The MAPE for this algorithm is: 1.37%
The MAPE for this algorithm is: 5.44%
for categories in training_df['Category'].unique():
cat_wise = training_df[training_df["Category"] == categories]
test_data = cat_wise[cat_wise['Date'].isin(test_dates)]
train_data = cat_wise[cat_wise['Date'].isin(training_dates)].groupby('Date')['Weekly_Sales'].sum()
overall_sales_actual = test_data.groupby('Date')['Weekly_Sales'].sum()
overall_sales_forecast = test_data.groupby('Date')['Forecast'].sum()
plot_eval(train_data, overall_sales_actual, overall_sales_forecast, f"HW-TES on granular data - aggregated by Category {categories} - ", False)
The MAPE for this algorithm is: 2.24%
The MAPE for this algorithm is: 2.88%
The MAPE for this algorithm is: 2.77%
The MAPE for this algorithm is: 4.29%
The MAPE for this algorithm is: 5.16%
training_df.to_csv('Forecasted_vals.csv')
Graveyard¶
Mahalanobis + Hotelling T2 (Discarded)¶
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from scipy.stats import f
from scipy.spatial.distance import mahalanobis
from sklearn.preprocessing import StandardScaler
# Turning the two stores into a different type to avoid bias in results
training_df.loc[training_df['Store'].isin([33, 36]), 'Type'] = 'D'
# Convert Date to datetime and week number
training_df['Date'] = pd.to_datetime(training_df['Date'])
training_df['Week_Num'] = (training_df['Date'] - training_df['Date'].min()).dt.days // 7
# Function to compute trend and volatility for each store
def compute_trend_volatility(df):
result = []
for store, group in df.groupby('Store'):
# Compute trend
X = group[['Week_Num']].values
y = group['Weekly_Sales'].values
if len(X) > 1:
model = LinearRegression().fit(X, y)
trend = model.coef_[0]
else:
trend = np.nan
# Compute volatility
volatility = np.std(y)
result.append({
'Store': store,
'Sales_Trend': trend,
'Sales_Volatility': volatility
})
return pd.DataFrame(result)
store_metrics = compute_trend_volatility(training_df)
# Get store types and sizes
store_info = training_df[['Store', 'Type', 'Size']].drop_duplicates()
# Merge with computed metrics
features_df = pd.merge(store_metrics, store_info, on='Store')
# Normalize features for Mahalanobis
feature_columns = ['Sales_Trend', 'Sales_Volatility', 'Size']
X = features_df[feature_columns]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
features_df_scaled = pd.DataFrame(X_scaled, columns=feature_columns)
features_df_scaled['Store'] = features_df['Store'].values
features_df_scaled['Type'] = features_df['Type'].values
def hotellings_t2_test(target_store, group_type, features_df):
# Extract target vector
target_row = features_df[features_df['Store'] == target_store][feature_columns].values[0]
# Extract group vectors
group = features_df[features_df['Type'] == group_type]
group_X = group[feature_columns].values
# n and p values for calculating the F-stat
n, p = group_X.shape
# Compute covariance matrix and inverse
cov = np.cov(group_X, rowvar=False)
cov_inv = np.linalg.inv(cov)
# Compute distances
d = mahalanobis(target_row, group_X.mean(axis=0), cov_inv)
# Convert T² to F-statistic
F_stat = (n - p) * (d**2) / (p * (n - 1))
df1 = p
df2 = n - p
p_value = 1 - f.cdf(F_stat, df1, df2)
return d, F_stat, p_value
# Run test
for store in [33, 36]:
for group in ['A', 'C']:
d, F_stat, p_val = hotellings_t2_test(store, group, features_df_scaled)
print(f"\nStore {store} vs Type {group}:")
print(f" Mahalanobis Distance: {d:.3f}")
print(f" F-statistic: {F_stat:.3f}")
print(f" p-value: {p_val:.4f}")
Store 33 vs Type A: Mahalanobis Distance: 6.998 F-statistic: 14.604 p-value: 0.0001 Store 33 vs Type C: Mahalanobis Distance: 4.765 F-statistic: 4.541 p-value: 0.1229 Store 36 vs Type A: Mahalanobis Distance: 7.654 F-statistic: 17.474 p-value: 0.0000 Store 36 vs Type C: Mahalanobis Distance: 10.785 F-statistic: 23.262 p-value: 0.0140
Breakpoint check with Bai-Parron Test (Insignificant)¶
!pip install ruptures
Collecting ruptures Downloading ruptures-1.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB) Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from ruptures) (2.0.2) Requirement already satisfied: scipy in /usr/local/lib/python3.11/dist-packages (from ruptures) (1.15.3) Downloading ruptures-1.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 14.1 MB/s eta 0:00:00 Installing collected packages: ruptures Successfully installed ruptures-1.1.9
import pandas as pd
import numpy as np
import ruptures as rpt
# Load the training data (assuming it's preloaded as 'training_df')
# If not, you would normally load it here
# Ensure 'Date' is in datetime format
training_df["Date"] = pd.to_datetime(training_df["Date"])
# Function to perform Bai-Perron test on aggregated data for a group
def detect_breakpoints(df, group_col):
print(f"\n📍 Breakpoints in {group_col} (Bai-Perron Test):\n")
for group, group_df in df.groupby(group_col):
# Aggregate sales by date
sales = group_df.groupby("Date")["Weekly_Sales"].sum().reset_index()
sales.set_index("Date", inplace=True)
series = sales["Weekly_Sales"].values
if len(series) < 20:
print(f"{group_col} {group} does not have enough data.")
continue
# Bai-Perron style: use Pelt with linear model cost
model = rpt.Pelt(model="rank").fit(series)
breakpoints = model.predict(pen=3)
if len(breakpoints) > 1:
print(f"{group_col} {group} → Breakpoints at indices: {breakpoints[:-1]}")
rpt.display(series, breakpoints)
plt.show()
# Run for Type-wise data
detect_breakpoints(training_df, "Type")
detect_breakpoints(training_df, "Category")
📍 Breakpoints in Type (Bai-Perron Test): Type A → Breakpoints at indices: [30, 40, 45, 90, 100, 105]
Type B → Breakpoints at indices: [30, 40, 45, 90, 100, 105, 135]
Type C → Breakpoints at indices: [90]
Type D → Breakpoints at indices: [25, 65, 85]
📍 Breakpoints in Category (Bai-Perron Test): Category Erratic Declining → Breakpoints at indices: [25, 30, 40, 45, 70, 75, 90, 100, 105, 130]
Category Erratic Increasing → Breakpoints at indices: [90, 100]
Category Reliable → Breakpoints at indices: [65, 120, 135]
Category Standard → Breakpoints at indices: [25, 30, 40, 45, 65, 95, 100, 105, 130, 135]
Category Weak → Breakpoints at indices: [15, 25, 35, 50, 65, 75, 90, 100, 105, 120, 125, 130, 140]
In the different store-types, we see that store types A and B follow similar seasonal pattern and that cause breakpoints. For store type C however, we see a breakpoint around point 90 - which corresponds with the trend graph earlier. This would be something to consider when coming up with customized forecasting algorithms.
In the department category-wise breakpoints, we see breakpoints corresponding to an increasing and decreasing trend in Erratic Increasing and Erratic Decreasing departments respectively. We also see a breakpoint corresponding with an increasing sales trend in the reliable departments. Most of these also correspond with breapoint 90.